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AN UNEXPECTED ENCOUNTER 
WITH CAUCHY AND LEVY 


By Natesh S. PillaT’^ and Xiao-Li Meng^’’^ 

Department of Statistics, Harvard University^ 

The Cauchy distribution is usually presented as a mathematical 
curiosity, an exception to the Law of Large Numbers, or even as an 
“Evil” distribution in some introductory courses. It therefore sur¬ 
prised us when Drton and Xiao (2016) proved the following result 
for m = 2 and conjectured it for m > 3. Let X = (Xi,..., Xm) and 
Y = (Yi,... , Ym) be i.i.d. N(0, E), where E = Wij} > 0 is an m x m 
and arbitrary covariance matrix with ajj > 0 for all 1 < j < m. Then 

^ X 

Z = - Cauchy(0,1), 

as long as w = (wi,..., Wm) is independent of {X, Y), Wj > 0,j = 
1,... ,m, and YlJLi ~ 1- this note, we present an elementary 
proof of this conjecture for any m > 2 by linking Z to a geometric 
characterization of Cauchy(0,l) given in Wiliams (1969). This general 
result is essential to the large sample behavior of Wald tests in many 
applications such as factor models and contingency tables. It also 
leads to other unexpected results such as 


EE 


WiWjaij 

XiXj 


Levy(0,l). 


This generalizes the “super Cauchy phenomenon” that the average 
of m i.i.d. standard Levy variables (i.e., inverse chi-squared variables 
with one degree of freedom) has the same distribution as that of a 
single standard Levy variable multiplied by m (which is obtained by 
taking Wj — 1/m and E to be the identity matrix). 


1. Cauchy Distribution: Evil or Angel? Many of us may recall the 
surprise or even a mild shock we experienced the first time we encountered 
the Cauchy distribution. What does it mean that it does not have a mean? 
Surely one can always take a sample average, and surely it should converge 
to something by the Law of Large Numbers (LLN). But then we learned 
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that the LLN does not apply to the Cauchy distribution. Obviously there 
cannot be an upper bound on how things may vary in general, and hence 
it is not difficult to imagine a distribution with infinite variance. But the 
non-existence of the mean, which is not the same as the mean is infinite, 
is harder to envision intuitively. Therefore, some introductory courses (e.g., 
at our institution) have given the Cauchy distribution the nickname “Evil”, 
because it has created a few excruciating moments (no pun intended), even 
for some of our best young minds when they tried hard to understand the 
meaning of not having a mean. 

Of course gain often comes with pain, because soon we would learn some¬ 
thing deeper. The non-existence of the mean for Cauchy is a reflection of 
the fact that the sample average of an i.i.d. Cauchy sample actually does 
converge, except it does not converge to a conventional mean, i.e., a deter¬ 
ministic number. Rather, it converges trivially to a Cauchy random variable, 
and more surprisingly, the limiting distribution is the same as that for each 
term in the entire sequence, as indexed by the sample size. In this sense 
the Cauchy distribution is as nice as an angel, because probabilistically its 
sample average sequence never deviates from its starting point, a dream case 
for anyone who studies probabilistic behavior of a random sequence. 

Through settling a conjecture set forth in [4], we prove in this article 
that this nice property can hold even when the i.i.d. assumption is violated 
(and the terms are not trivially identical). Specifically, let S = {crij} > 0 
and ajj > 0 for all j = l,...,m, and let X,Y be independent variables 
distributed as N(0, S). We denote the row vectors as X = (Xi,..., X^) and 
Y = (Yi,..., Ym)- Let w = {wi, ..., Wm) be a random vector such that 

m 

w T {X, Y }; Wj = 1; and Wj > 0, j = 1,, m. (1-1) 
j=i 

Define the random variable 

m 

j=i i 

In [4], the rcj’s were assumed to be fixed constants, but by a conditioning 
argument, it is trivial to generalize from deterministic w to a random w as 
long as it is independent of (X, Y). Therefore, throughout this article we will 
present the more general random (but independent) w version of the results 
presented in [4] and in the literature with fixed w, whenever appropriate. 

When S = cr'^lmxm, h is well-known that Z has the standard Cauchy dis¬ 
tribution on R with pdf , denoted by Cauchy(0,1); Cauchy(;U, a) 
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then denotes the distribution of ^ + aZ. The fascination with this result is 
evident from the number of different approaches proposed in the literature 
to prove it, such as by characteristic functions, convolutions, multivariate 
change of variables, and most recently by a trigonometric approach [2]. 

Nevertheless, for arbitrary S (so that the terms Xj/Yj, j = 1,... ,m are 
no longer independent in general), few would expect that Z might remain to 
be Cauchy(0,l). However, through simulations Drton and Xiao [4] suspected 
that this was indeed the case, and via a rather complex and indirect argu¬ 
ment involving the Residue theorem, they were able to prove it when m = 2 
and some cases of perfect correlation when m > 2. They conjectured that 
the result should hold for m > 2 for an arbitrary S, but their argument does 
not seem to be easily generalizable to the m> 2 case, nor was it feasible to 
invoke induction because of the dependence induced by S. 

Seriously intrigued by the findings and the conjecture in [4], we worked for 
a while trying to extend their complex analytic approach. By using copulas of 
Cauchy distributions and also the Residue theorem, we ultimately succeeded 
in finding a proof for all m > 2. However, we were not satisfied by our 
lengthy proof because it did not provide any geometric interpretation or 
statistical insight. We therefore continued to search for a simpler and more 
inspiring approach. Thanks to an elegant but less well-known geometric 
characterization of Cauchy(0,l) given in [14] and in [2, 15], we are able to 
provide an elementary and geometrically appealing proof of the following 
result, conjectured by Drton and Xiao in [4]. 

Theorem 1.1. For any S = {dij} > 0 such that Ujj > 0 for all j = 
l,...,m and w satisfying (1.1), the random variable Z defined in (1.2) is 
distributed as Cauchy(0, 1). 

A theoretical speculation from this unexpected result is that for a set of 
random variables the dependence among them can be over¬ 

whelmed by the heaviness of their marginal tails (e.g., take = XjjYj) in 
determining the stochastic behavior of their linear combinations. We invite 
the reader to ponder with us whether this is a pathological phenomenon or 
something profound. 

2. Applications and prior work. As discussed in [4], the Z in (1.2) 
naturally appears in many important applications. Following [4], let q G 
M[xi,..., Xm] be a homogeneous m variate polynomial with gradient Vg. 
Then by the (5-method, the variance of q{X) = g(Xi,..., A^) can be ap- 
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proximated by Vq{X)T,V~^q{X), resulting in the Wald statistics 

= V,(X)SVMX) " {v[log,(X)lEV-^[log,(X)]}-‘. (2.1) 

That is, quoting [4], “the random variable Wq^-£ appears in the large sample 
behavior of Wald tests with S as the asymptotic covariance matrix of an 
estimator and the polynomial q appearing in a Taylor approximation to 
the function that defines the constraint to be tested.” Thus there are many 
applications in which a distribution theory for is needed. These include 
contingency tables [6-8], graphical models [3, Chapter 4], and the testing 
of so-called “tetrad constraints” in factor analysis [4, 9]. See [4] for more 
applications and an extensive list of references. 

When X ~ N(0, S), by the arguments presented in Section 6 of [4], Theo¬ 
rem 1.1 implies the following result, also conjectured in [4], on the quadratic 
forms for Gaussian random variables. 


Theorem 2.1. Let S = {aij} > 0 and > 0 for all j = 
and X = (Xi,..., Xm) ~ N(0, S). If q{xi, ..., Xm) = ■ ■ ■ x^ with non¬ 

negative real exponents ai,..., such that Oj > 0, then 

VZ^j=l ^j) 

where Xi denotes a standard chi-squared variable with 1 degree of freedom. 

An obvious surprising aspect of Theorem 2.1 is that the exact distribu¬ 
tion of Wq^Y. is free of S. A consequential but somewhat hidden surprise is 
revealed by expressing Theorem 2.1 in the following equivalent form. 


Theorem 2.2. Let S he the same as in Theorem 2.1. For any (tci, ..., Wm 
satisfying (1.1), and X ~ N(0,S), we have 


Wl 


xj VXi’ 


‘^m 


T 




( 2 . 2 ) 


where Xi ^ 


denotes an inverse chi-squared variable with 1 degree of freedom. 


These two results are equivalent when we observe that for q{xi ,..., Xm) = 
xf" ...xfT: Vlogg = (1^5 Theorem 2.2 therefore is merely a re¬ 

expression of Theorem 2.1 using the rightmost expression of (2.1), and by 
letting Wj = o,j/Yk^kij = l,...,m. (The generalization to random but 
independent Wj's follows the conditioning argument discussed previously.) 
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When S = Imxm, Theorem 2.2 recovers the not-so-well-known “super 
Cauchy phenomenon”, i.e., the average of m i.i.d. is distributed exactly 
as m times a single Xi'^j by taking Wj = m~^ for all j’s in Theorem 2.2. The 
Xi"^ distribution is also known as the standard Levy distribution (with loca¬ 
tion parameter equal to 0 and scale parameter 1; see [1], p. 33). This result 
can easily be verified via the characteristic function of the Xi"^ distribution, 

4>{t) = 


because (j)^{t/m) = We call this “super Cauchy phenomenon” be¬ 

cause it says that for an i.i.d. Levy sample of size m, their average is m 
times more variable than any one of them, exceeding the case of Cauchy 
where the average has the same variability as a single variable. That is, if 
we denote the a-percentile of the average of m i.i.d. samples by p^\ then 
for the Cauchy sample we have = pa \ but for the Levy sample, we 
have = mpa^. 

Clearly the characteristic function approach does not apply when S is not 
diagonal, but nevertheless Theorem 2.2 says that the above distributional 
result generalizes when S goes beyond the diagonal case. At the first glance, 
result (2.2) might seem to be a wishful thinking by a novice to probability or 
algebra, who (mistakenly) treats , where X~^ = ..., X^'\, 

as {XTi~^X~^, which would then permit him to use the usual standard¬ 
ization trick by letting Z = ~ N(0, !)■ However, this would have led 

him to guess that the left hand side of (2.2), when Wj = m~^,j = 1,..., m, 
distributes as which would then be Tn~^x^i iiot Xi‘^- 

It is instructive to express the left hand side of (2.2) as the average of 
terms, when we take wj = m~^\ 


1 

m? 


EE 


XiXj 




(2.3) 


This is a rather remarkable result because the left-hand side can only be 
made invariant (algebraically) to the variances ajj by expressing Xj = 
^JaJjX'- with variance of X'- equal to 1, but not to the correlations pij = 
Oij!^OiiOjy Yet, (2.2) says that it is actually a pivotal quantity for {pij}. 

There are also some works on multivariate Cauchy densities that are rel¬ 
evant to our problem. For instance, in [12] and [10], the authors studied the 
distribution theory for the ratio of two Gaussian random variables. Ferguson 
[5] derived a general result for the characteristic function of a multivariate 
Cauchy distribution. In [11], the authors studied a generalization of the bi¬ 
variate Cauchy distribution. McCullagh [13] showed that it is natural to 
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parametrize the family of Cauchy (/i, a) distributions in the complex plane, 
as this location-scale family is closed under Mobius transformations. Finally, 
we remark that Drton and Xiao [4] did not prove the m = 2 case of Theorem 
1.1 directly; instead, they first proved of a special case of Theorem 2.1 (with 
q{xi,X 2 ) = generalizing a similar result of Glonek [8]. Drton and 

Xiao then obtained the conclusion of Theorem 1.1 for m = 2 as a corollary 
using change of variables. 

3. Proof. The key idea of our proof relies on the following result from 
[14], as reformulated in [15]. The proof of this lemma as given in [14] is 
short, and it also relies on the Residue theorem for contour integration. 
Very recently, the author of [2] gave a geometric proof for the case of m = 2. 

Lemma 3.1. Let 0i ~ Unif(— vr, vr], and {wi,... ,Wm} be independent 
o/01, where Wj > 0 and Then for any {ui,... ,Um}, where 

Uj G M, 

m 

Wj tan(0i + Uj) ~ Cauchy(0,1). 


Intuitively, if 0i ~ Unif(— tt, vr], then for any constant Uj, (0i -|- Uj) 
mod (27r) ~ Unif(—vr, vr], and hence tan(0i -|- Uj) ~ Cauchy(0,1). The sig¬ 
nificance of this Lemma is that any convex combination of these dependent 
Cauchy(0,1) variables is still distributed as Cauchy(0,1). As we shall see 
below. Theorem 1.1 is a direct consequence of this remarkable result. We 
first prove Theorem 1.1 when S is strictly positive definite, i.e., X > 0, and 
then invoke a limiting argument to cover the cases with X > 0. 

Proof of Theorem 1.1. When X > 0, we write X“^ = {hj}. The joint 
density of {X,Y) then can be written as 

fx,Yix,y) = Arexpj - -(^'^bjj{x‘j + y"-) + 2'^bjkixjXk + yjVk)'^'^, 

j=l j^k 

where x,y G and K is a constant that depends only on m and X. 
Let us make the transformation {Xj,Yj) = {Rj sm{Qj), Rj cos{Qj)), where 
0 < Rj < oo and Qj G (— vr, vr]. Then the joint density of R = {Ri, ..., Rm} 
and 0 = { 01 ,..., 0m} is 


fR,&{r,e) oc expj - 


2 ^ ^ bjk^j^k cosi^Oj 0}^ 


n 

i=i 




(3.1) 
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Fig 1. For every value of 0i 7 ^ 0, the map 0j 1 -^ Uj {j > 2) is a disjoint union of two 
lines, and is one-to-one. The figure shows the graph of {Qj,Uj) when 0i = ^ (plotted as 
two solid lines). When 0i = 0, Uj = 0j (plotted in dashed line). 


for r G [ 0 , 00 )™ and 6 G (— 7 r, 7 r]™'. The term fljli ™ Equation (3.1) is 
the Jacobian of the {X,Y) —>• (i?, 0) transformation. 

We then make a further transformation, (F : (— 7 r, 7 r]”* 1 —>■ (— 7 r, 7 r]™', with 
/■(©i,..., Qjn) = ( 01 , t/ 2 ,..., Um), where 

- ©i) + 27r[l{e._ej<_^} - l{e._ 0 i>^}], 2 < j < m. (3.2) 

This is a form of Uj = {Qj — 0i) mod (27r), but with the assurance that the 
support of Uj is (—tTjTt] regardless of the value of 0i, and that Uj — Uy = 
{Qj — Qk) mod (27r), and Qj = (0i + Uj) mod (27r). The map F is one- 
to-one as shown in Figure 1. Furthermore, the points where the map F is 
not differentiable is contained in the set 

{0 G (—vr, tt]”* : Qj — 0i G {—tt, tt} for some j > 2}, 

as can be seen from Figure 1. Clearly this set has Lebesgue measure zero, 
©utside this set, we have = 1. Thus the Jacobian of the map is 1 for 
all 0 G (—vr, tt]”* except for the above measure zero set. 

Set ?7i = 0 and denote U = {Ui, U 2 , ■ ■ ■, Um)- Since cos(VFi) = cos(lF 2 ) 
for any Wi = W 2 mod (27r), we can write the joint density in the new 
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coordinates as 

^ m m 

fR,ei,uir, 01 , w) oc exp I - - ^ bjjrj + 2 ^ bjkrjrk cos{uk - Uj)^ | rj 

i=i jVfc i=i 

with r G [ 0 , 00 )™, 01 G (—tt, vr], tti = 0 and U 2 ,---,Um G (—7r,7r]. The only 
observations we need from the above line are: (i) 0i is independent of U 
and (ii) 0i ~ Unif(— tt, vr]. But Z in (1.2) can be written as 

m m m 

Z = tan(0j) = Wj tan(0i + Uj), 

j=i J j=i j=i 

because tan(lTi) = tan(TT 2 ) for any ITi = W 2 mod (27r). Since U is inde¬ 
pendent of 01, conditional on U, Lemma 3.1 yields that Z ~ Cauchy(0,1). 
It follows immediately that Z is also marginally distributed as Cauchy(0,1), 
completing the proof when S > 0. 

When we relax the assumption S > 0 to S > 0, may not exist. How¬ 
ever, for any n G N, = S -|- > 0. Let ... ,Xm'^) 

and ..., yii”^) be i.i.d. from N(0,S^”^). As n — 00 , we have 

(xW,yW) ^ (X,Y), where indicates convergence in distribution. 
Next the mapping i->- M defined by 

m 

Cix,y) = 

is continuous, except when y G B with 

B = {{yi,... ,ym) : min \yj\=0}. 

l<j<m 

Now the result follows from the continuous mapping theorem. Indeed, since 
(xW,yW) ^ {X,Y), the continuous mapping theorem yields that 

Z^^^ = C{X^^\Y^^^) ^ C{X,Y) = Z (3.3) 

as n —>■ 00 , provided the points of discontinuity of C belong to a zero-measure 
set. However, since Yj ~ N(0, ajj) where ajj > 0 by our assumption, we have 

m 

P(y G H) = P( min |y| = 0) < VP(|y| = 0) = 0, 

l<l<m ^ 

i=i 

verifying (3.3). By our previous argument, we know that ~ Cauchy(0,1) 
for all n G N, and hence (3.3) implies that Z ~ Cauchy(0,1). □ 
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